BRTDP: An efficient solution for Bounded-Parameter Markov Decision Process

نویسندگان

  • Fernando L. Fussuma
  • Karina Valdivia Delgado
  • Leliane Nunes de Barros
چکیده

Bounded-parameter Markov decision process (BMDP) can be used to model sequential decision problems, where the transitions probabilities are not completely know and are given by intervals. One of the criteria used to solve that kind of problems is the maximin, i.e., the best action on the worst scenario. The algorithms to solve BMDPs that use this approach include interval value iteration and an extension of real time dynamic programming (Robust-LRTDP). In this paper, we introduce a new algorithm, named BRTDP, also based on real time dynamic programming that makes a different choice of the next state to be visited using upper and lower bounds of the optimal value function. The empirical evaluation of the algorithm shows that it converges faster than the state-of-the-art algorithms that solve BMDPs.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bounded Parameter Markov Decision Processes Bounded Parameter Markov Decision Processes

In this paper, we introduce the notion of a bounded parameter Markov decision process as a generalization of the traditional exact MDP. A bounded parameter MDP is a set of exact MDPs speciied by giving upper and lower bounds on transition probabilities and rewards (all the MDPs in the set share the same state and action space). Bounded parameter MDPs can be used to represent variation or uncert...

متن کامل

Bounded Parameter Markov Decision Processes with Average Reward Criterion

Bounded parameter Markov Decision Processes (BMDPs) address the issue of dealing with uncertainty in the parameters of a Markov Decision Process (MDP). Unlike the case of an MDP, the notion of an optimal policy for a BMDP is not entirely straightforward. We consider two notions of optimality based on optimistic and pessimistic criteria. These have been analyzed for discounted BMDPs. Here we pro...

متن کامل

Structured Parameter Elicitation

The behavior of a complex system often depends on parameters whose values are unknown in advance. To operate effectively, an autonomous agent must actively gather information on the parameter values while progressing towards its goal. We call this problem parameter elicitation. Partially observable Markov decision processes (POMDPs) provide a principled framework for such uncertainty planning t...

متن کامل

Factored Markov decision processes with Imprecise Transition Probabilities

This paper presents a short survey of the research we have carried out on planning under uncertainty where we consider different forms of imprecision on the probability transition functions. Our main results are on efficient solutions for Markov Decision Process with Imprecise Transition Probabilities (MDP-IPs), a generalization of a Markov Decision Process where the imprecise probabilities are...

متن کامل

Symbolic Bounded Real-Time Dynamic Programming

Real-time dynamic programming (RTDP) solves Markov decision processes (MDPs) when the initial state is restricted. By visiting (and updating) only a fraction of the state space, this approach can be used to solve problems with intractably large state space. In order to improve the performance of RTDP, a variant based on symbolic representation was proposed, named sRTDP. Traditional RTDP approac...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014